0%

Robust Chinese Word Segmentation with Contextualized Word Representations

标题 说明 时间
Robust Chinese Word Segmentation with Contextualized Word Representations 论文原文 20190117
pywordseg 论文实现 Open Source State-of-the-art Chinese Word Segmentation System with BiLSTM and ELMo. 20190120

Robust Chinese Word Segmentation with Contextualized Word Representations

摘要
In recent years, after the neural-network-based method was proposed, the accuracy of the Chinese word segmentation task has made great progress. However, when dealing with out-of-vocabulary words, there is still a large error rate. We used a simple bidirectional LSTM architecture and a large-scale pretrained language model to generate high-quality contextualize character representations, which successfully reduced the weakness of the ambiguous meanings of each Chinese character that widely appears in Chinese characters, and hence effectively reduced OOV error rate. State-of-the-art performance is achieved on many datasets.

摘要
近年来,在提出基于神经网络的方法后,中文分词任务的准确性取得了很大进展。 但是,在处理词汇外单词时,仍然存在较大的错误率。 我们使用简单的双向LSTM架构和大规模预训练语言模型来生成高质量的语境化字符表示,成功地减少了汉字中广泛出现的每个汉字的歧义的弱点,从而有效地减少了OOV误差率。 在许多数据集上实现了最先进的性能。

模型结构

本站所有文章和源码均免费开放,如您喜欢,可以请我喝杯咖啡